Skip to content

ZEPPELIN-3375: Make PySparkInterpreter extends PythonInterpreter#2919

Closed
zjffdu wants to merge 1 commit intoapache:masterfrom
zjffdu:ZEPPELIN-3375
Closed

ZEPPELIN-3375: Make PySparkInterpreter extends PythonInterpreter#2919
zjffdu wants to merge 1 commit intoapache:masterfrom
zjffdu:ZEPPELIN-3375

Conversation

@zjffdu
Copy link
Contributor

@zjffdu zjffdu commented Apr 11, 2018

What is this PR for?

This PR is trying to remove the code duplication between PySparkInterpreter and PythonInterpreter. So here's the main things this PR did:

  • PySparkInterpreter extends PythonInterpreter
  • PySparkInterpreterTest extends PythonInterpreterTest so that we can verify PySparkInterpreter can do whatever PythonInterpreter can do
  • Move interpreter/lib/python/backend_zinline.py and interpreter/lib/python/mpl_config.py into python module, so that python module can ship these resources together.

What type of PR is it?

[ Improvement | Refactoring]

Todos

  • - Task

What is the Jira issue?

How should this be tested?

  • CI pass

Screenshots (if appropriate)

Questions:

  • Does the licenses files need update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

@zjffdu zjffdu force-pushed the ZEPPELIN-3375 branch 2 times, most recently from 6116344 to d6481ea Compare April 11, 2018 05:34
@zjffdu
Copy link
Contributor Author

zjffdu commented Apr 11, 2018

@felixcheung @Leemoonsoo Could you help review it ? Thanks

//TODO(zjffdu) don't do hard code on py4j here
File py4jDestFile = new File(pythonWorkDir, "py4j-src-0.9.2.zip");
FileUtils.copyURLToFile(getClass().getClassLoader().getResource(
"python/py4j-src-0.9.2.zip"), py4jDestFile);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, 2.3 is running with Py4J 0.10.6

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this detect any mismatch here? check spark version or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine to use py4j 0.9.2 here for IPythonInterpreter, as for IPySparkInterpreter it would use the py4j of spark instead of py4j 0.9.2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why Spark doesn't ship py4j zip file version-agnostic? Filed https://issues.apache.org/jira/browse/SPARK-23965

Copy link
Member

@HyukjinKwon HyukjinKwon Apr 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a strong reason to rename or make a link for Spark's Py4J within Spark. Also, to be clear, I think It's an orthogonal issue with the current change here, if I am not mistaken.

throw new IOException("Fail to run shell commands: " + StringUtils.join(commands, " "));
}
logger.info("Complete shell commands: " + StringUtils.join(commands, " "));
return outputGobbler.getOutput();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we had some launcher wrapper for something like this?

try {
interpreter.close();
} catch (InterpreterException e) {
LOGGER.warn("Fail to close interpreter: " + interpreter.getClassName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would the exception stack be useful?
just to LOGGER.warn( .... , e);?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


jsc = intp.getJavaSparkContext()

if sparkVersion.isImportAllPackageUnderSparkSql():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not keeping this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only used when spark version is lower than 1.3. There's many code in zeppelin that is for specific old spark version. I don't think we need them, actually we have no test for them, no one know whether they works or note. I think it is time to zeppelin drop support for old version of spark. But it require more work, will do it in another PR for this.

try {
bootstrapInterpreter("python/zeppelin_pyspark.py");
} catch (IOException e) {
e.printStackTrace();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

except:
raise Exception(traceback.format_exc())
if not isForCompletion:
exception = traceback.format_exc()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment what this is looking for and what it looks like typically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

gateway = JavaGateway(client, auto_convert = True)
intp = gateway.entry_point
# redirect stdout/stderr to java side so that PythonInterpreter can capture the python execution result
output = Logger()
Copy link
Member

@felixcheung felixcheung Apr 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are problems reported with these names too common, conflict with existing variables, output, gateway, client etc, we should name these uniquely if we could - even if "temporary" since in python variables have global scope

Copy link
Contributor Author

@zjffdu zjffdu Apr 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It if fine to use them, because they are not in the same namespace of user code (they are not visible to users).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure - this sets globally right? for example z is accessible from user code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User code is in namespace _zcUserQueryNameSpace instead of global namespace https://github.com/zjffdu/zeppelin/blob/ZEPPELIN-3375/python/src/main/resources/python/zeppelin_python.py#L91

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

return None
else:
objectDefList = execResult['objectDefList']
return [completion for completion in execResult['objectDefList'] if completion.startswith(methodName)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startswith -I think a lot times partial match - not necessarily from the beginning, can be good?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be, there's a lot work to do for improving the code completion. This PR is large, I don't want to cover too much thing in this single PR.

@zjffdu zjffdu force-pushed the ZEPPELIN-3375 branch 3 times, most recently from 91ef36d to 4d30a5e Compare April 13, 2018 09:30
@zjffdu
Copy link
Contributor Author

zjffdu commented Apr 16, 2018

Will merge if no more comment

@asfgit asfgit closed this in 0a97446 Apr 16, 2018
jwagun pushed a commit to jwagun/zeppelin that referenced this pull request Apr 23, 2018
### What is this PR for?
This PR is trying to remove the code duplication between PySparkInterpreter and PythonInterpreter. So here's the main things this PR did:
* PySparkInterpreter extends PythonInterpreter
* PySparkInterpreterTest extends PythonInterpreterTest so that we can verify PySparkInterpreter can do whatever PythonInterpreter can do
* Move interpreter/lib/python/backend_zinline.py and interpreter/lib/python/mpl_config.py into python module, so that python module can ship these resources together.

### What type of PR is it?
[ Improvement | Refactoring]

### Todos
* [ ] - Task

### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-3375

### How should this be tested?
* CI pass

### Screenshots (if appropriate)

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: Jeff Zhang <zjffdu@apache.org>

Closes apache#2919 from zjffdu/ZEPPELIN-3375 and squashes the following commits:

738c6c5 [Jeff Zhang] ZEPPELIN-3375. Make PySparkInterpreter extends PythonInterpreter
zjffdu added a commit to zjffdu/zeppelin that referenced this pull request Jun 8, 2018
This PR is trying to remove the code duplication between PySparkInterpreter and PythonInterpreter. So here's the main things this PR did:
* PySparkInterpreter extends PythonInterpreter
* PySparkInterpreterTest extends PythonInterpreterTest so that we can verify PySparkInterpreter can do whatever PythonInterpreter can do
* Move interpreter/lib/python/backend_zinline.py and interpreter/lib/python/mpl_config.py into python module, so that python module can ship these resources together.

[ Improvement | Refactoring]

* [ ] - Task

* https://issues.apache.org/jira/browse/ZEPPELIN-3375

* CI pass

* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: Jeff Zhang <zjffdu@apache.org>

Closes apache#2919 from zjffdu/ZEPPELIN-3375 and squashes the following commits:

738c6c5 [Jeff Zhang] ZEPPELIN-3375. Make PySparkInterpreter extends PythonInterpreter

(cherry picked from commit 0a97446)
zjffdu added a commit to zjffdu/zeppelin that referenced this pull request Jun 8, 2018
This PR is trying to remove the code duplication between PySparkInterpreter and PythonInterpreter. So here's the main things this PR did:
* PySparkInterpreter extends PythonInterpreter
* PySparkInterpreterTest extends PythonInterpreterTest so that we can verify PySparkInterpreter can do whatever PythonInterpreter can do
* Move interpreter/lib/python/backend_zinline.py and interpreter/lib/python/mpl_config.py into python module, so that python module can ship these resources together.

[ Improvement | Refactoring]

* [ ] - Task

* https://issues.apache.org/jira/browse/ZEPPELIN-3375

* CI pass

* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: Jeff Zhang <zjffdu@apache.org>

Closes apache#2919 from zjffdu/ZEPPELIN-3375 and squashes the following commits:

738c6c5 [Jeff Zhang] ZEPPELIN-3375. Make PySparkInterpreter extends PythonInterpreter

(cherry picked from commit 0a97446)
zjffdu added a commit to zjffdu/zeppelin that referenced this pull request Jun 8, 2018
This PR is trying to remove the code duplication between PySparkInterpreter and PythonInterpreter. So here's the main things this PR did:
* PySparkInterpreter extends PythonInterpreter
* PySparkInterpreterTest extends PythonInterpreterTest so that we can verify PySparkInterpreter can do whatever PythonInterpreter can do
* Move interpreter/lib/python/backend_zinline.py and interpreter/lib/python/mpl_config.py into python module, so that python module can ship these resources together.

[ Improvement | Refactoring]

* [ ] - Task

* https://issues.apache.org/jira/browse/ZEPPELIN-3375

* CI pass

* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: Jeff Zhang <zjffdu@apache.org>

Closes apache#2919 from zjffdu/ZEPPELIN-3375 and squashes the following commits:

738c6c5 [Jeff Zhang] ZEPPELIN-3375. Make PySparkInterpreter extends PythonInterpreter

(cherry picked from commit 0a97446)
asfgit pushed a commit that referenced this pull request Jun 8, 2018
This PR is trying to remove the code duplication between PySparkInterpreter and PythonInterpreter. So here's the main things this PR did:
* PySparkInterpreter extends PythonInterpreter
* PySparkInterpreterTest extends PythonInterpreterTest so that we can verify PySparkInterpreter can do whatever PythonInterpreter can do
* Move interpreter/lib/python/backend_zinline.py and interpreter/lib/python/mpl_config.py into python module, so that python module can ship these resources together.

[ Improvement | Refactoring]

* [ ] - Task

* https://issues.apache.org/jira/browse/ZEPPELIN-3375

* CI pass

* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: Jeff Zhang <zjffdu@apache.org>

Closes #2919 from zjffdu/ZEPPELIN-3375 and squashes the following commits:

738c6c5 [Jeff Zhang] ZEPPELIN-3375. Make PySparkInterpreter extends PythonInterpreter

(cherry picked from commit 0a97446)
(cherry picked from commit 595d45b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants